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ABSTRACT 

We develop the maximum-entropy weak shear mass reconstruction method presented 
in earlier papers by taking each background galaxy image shape as an independent 
estimator of the reduced shear field and incorporating an intrinsic smoothness into 
the reconstruction. The characteristic length scale of this smoothing is determined by 
Bayesian methods. Within this algorithm the uncertainties due to both the intrin¬ 
sic distribution of galaxy shapes and galaxy shape estimation are carried through to 
the final mass reconstruction, and the mass within arbitrarily shaped apertures can 
be calculated with corresponding uncertainties. We apply this method to two clus¬ 
ters taken from n-body simulations using mock observations corresponding to Keck 
LRIS and mosaiced HST WFPC2 fields. We demonstrate that the Bayesian choice 
of smoothing length is sensible and that masses within apertures (including one on 
a filamentary structure) are reliable, provided the field of view is not too small. We 
apply the method to data taken on the cluster MS1054-03 using the Keck LRIS (Clowe 
et al. 2000) and HST (Hoekstra et al. 2000), finding results in agreement with this 
previous work; we also present reconstructions with optimal smoothing lengths, and 
mass estimates which do not rely on any assumptions of circular symmetry. The code 
used in this work (LensEnt2) is available from the web. 

Key words: methods: data analysis - galaxies: clusters: general - cosmology: theory 
- dark matter - gravitational leasing 


1 INTRODUCTION 

Weak lensing studies of clusters of galaxies are an important 
complement to X-ray, Sunyaev-Zel’dovich effect and optical 
observations, allowing the projected distribution of mass to 
be investigated without any dynamical assumptions. The 
reconstruction of cluster mass distributions from weak grav¬ 
itational lensing data is now well established; it has been 
shown that the projected density distribution can be recov¬ 
ered from magnification data, in the form of background 
galaxy number densities (Broadhurst et al. 2001; Dye & Tay¬ 


lor 1998), or from shear data, the n et statistical disto r tion of 
the images of ba c kground galaxies (j TygonetjahJRgC jKaiser 


Squires 1993; Schneider & Seitz 1995; Squires & Kaiser 


i99e): 


We focus here on shear data, primarily because of its 
greater abundance; the likelihood function for shear data 
is also better understood (Section p|). Schneider, King and 
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Erben (200C) discuss the use of the two types of weak grav¬ 
itational lensing data. 

Reconstruction methods using shear data fall into two 
classes: direct and iterative inverse methods. The direct 
methods are based on the pioneering work of Kaiser and 
Squires (1993, KS93); many impr ovements have since been 
made to the original algorithm ( Schneider fc Seitz 1995 ; 
Kaiser 1995| ; Bartelmann 1995; ^quires fel^aiseTTg^ ). In all 
these methods the galaxy shape data have to be smoothed 
before their input to the algorithm; the smoothing length is 
a parameter that is left undetermined. The class of iterative 
methods aims to find the mass or pro jected gravitational po¬ 
tential map that best fits the data (Squires fc Kaiser 1991; 


Bartelmann et al. 1996; Seitz et al. 1998). These methods 


are well suited to irregularly shaped observations, since they 
do not suffer from edge effects in the same way as the direct 
methods; however, they need to be regularised in some way 
to prevent over-fitting the data, and it remains unclear how 
best to determine the resolution of either the data bins or 
the reconstruction grid. 

In two earlier papers (Bridle et al. 1998, Paper I; Bri- 
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2 P.J. Marshall et al. 


die et al. 2001, Paper II) we presented a maximum-entropy 
inverse method for reconstructing the mass distribution in 
clusters using shear and/or magnification data. In this pa¬ 
per we extend our method to give a fuller Bay esian analysis. 
As noted by other authors (Seitz et al. 199f), it would be 
desirable to work with each background galaxy shape indi¬ 
vidually, rather than binning or smoothing the data. This 
issue, together with the problem of the angular resolution of 
the reconstruction, is addressed by our extended algorithm. 
We apply our improved method to both realistic synthetic 
data, and previously published data for the high-redshift 
cluster MS1054-03. As with any Bayesian analysis, the aim 
is to derive and interpret the full posterior probability distri¬ 
bution of the quantity being inferred (in this case the mass 
distribution and any associated parameters). This approach 
will provide us not just with a mapping procedure, but also 
valuable insight into the quality of the data itself. 

The method is reviewed and further developed in Sec¬ 
tion & and is applied to simulated data in Section p. Sec¬ 
tion B contains the results of our method applied to the 
well-documented cluster MS1054-03, and gives a brief com¬ 
parison with the previous^ published work. Our conclusions 
are presented in Section H. 


2 METHOD 


The basis of the weak lensing reconstruction method de¬ 
scribed here is essentially that of Paper I; this section 
presents several developments in the algorithm and its im¬ 
plementation. 

A trial mass distribution S(0) is used to generate a 
predicted reduced shear field g{G) through the convolu¬ 
tion (Kaiser & Squires 1993, Paper I) 




( 1 ) 


where the convergence k(0) = E(0)/Ecrit (0) and Ecut is a 
factor dependent on the lens and source redshifts. 

By design the lensing convolution kernel D is a complex 
quantity that picks out the two types of lensing distortion 
gi = Re( 5 ) and g 2 = Im(g). Unbiased estimates of these 
components of reduced shear are given by the ensemble av¬ 
erage of th e background galaxy imag e ellipticity parameters 
ei and £2 (Schramm & Kayser 1995). 

As in Papers I and II, we aim to reconstruct the pro¬ 
jected mass density of the lens defined on a grid of square 
pixels, where the observing region occupies a smaller area 
within this grid. This allows for the fact that the mass out¬ 
side the observed field affects the shear data inside. It has 


been noted (Seitz et al. 1998) that reconstructing the pro¬ 
jected lensing potential allows a purely local estimate of the 
mass distribution to be derived, by numerical differentia¬ 
tion of the potential. This last step involves throwing away 
a small amount of information, that which describes the 
mass distribution outside the observing field. Although, as 
Seitz et al. point out, this information is limited, we feel it is 
as well to try and include it for completeness. In most cases, 
the cluster being studied will lie completely within the ob¬ 
serving field and the two reconstruction approaches should 
produce indistiguishable results; it is then a matter of taste 


as to which quanitity is inferred. Since here we are inter¬ 
ested in the masses of clusters, we choose to reconstruct the 
surface mass density directly, leading to simply-estimated 
projected masses with well-understood derived uncertain¬ 
ties. 


2.1 Using individual galaxy shapes 


In Paper I, the predicted reduced shear was compared with 
measured galaxy ellipt icities averaged in coarse grid cells. 
Following Seitz et al. ( 1998|) , we prefer to use each galaxy 
shape individually, as independent estimators of the reduced 
shear. This procedure removes the potential problem of the 
bin boundaries affecting the inferred mass distribution, al¬ 
lows for optimal angular resolution in the reconstruction, 
and leaves the data in as pure a form as possible. The re¬ 
construction grid pixel size is chosen to have approximately 
1 galaxy per pixel, leading to comparable numbers of data 
points and fitted parameters. However, each data point has 
a very low signal-to-noise ratio, indicating that the number 
of parameters should be reduced in some way - this issue 
is addressed in the next section. The convolution of Eq. (|^ 
is performed using Fast Fourier Transforms, and the result¬ 
ing reduced shear field is interpolated onto the background 
galaxy positions. 

Each of the 2N lensed ellipticity components ej of the N 
measured background galaxy images are taken as having 
been drawn independently from a Gaussian distribution 
with mean gj and variance Cuitrinsici here gj is the true value 
of the component of reduced shear at the position of the 
galaxy. We can then write the likelihood function as 


Pr(Data|E) = 

where is the usual misfit statistic 


( 2 ) 


N 2 

^2 _ (G,i ~ gj,i) 


and the normalisation factor is 
2 \ 


Zl = (27rcr ) 2 


( 3 ) 


( 4 ) 


The effect of errors introduced by the galaxy shape es¬ 
timation procedure have been included by addin g them in 
quadrature to the intrinsic elipticity dispersion (Hoekstra 
et al. 2000), 

= V '^obs + C^Lrinsic (5) 

This approximation rests on the assumption that both the 
shape estimation error and the unlensed ellipticity distri¬ 
butions are fitted well by Gaussians, and that the applied 
r educe d shear is not too large. We follow Schneider et al. 
( ^000| ) and correct the width of the ellipticity distributions 
by a factor of (1 — \g\'^) to account for the non-linearity 
in the lensing transformation (equation ^ below). We are 
concerned here with sub-critical clusters for which this cor¬ 
rection factor is small; in principle, the likelihood may be 
refined to include other effects as well. In practice, we find 
that this particular correction makes little difference to the 
reconstructions. 
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2.2 The ICF and Bayesian evidence 

Our inferences of the distribution of E in the cluster are 
based on the posterior probability distribution given by 
Bayes’ theorem: 

Pr(Data|E)Pr(E) 


Pr(E|Data) = 


Pr(Data) 


( 6 ) 


In Paper I, an entropic prior Pr(E) was introduced for this 
positive additive distribution; maximisation of Pr(E|Data) 


then 1 


I re nuees t o minimisa t ion o t t ne t unctioii r — x I 
where o io the entropy function for the diatribution. At 
this point the method is essentially an entropy-regularised 
maximum-likelihood technique, similar to tha t pub lished 
elsewhere by Seitz, Schneider and Bartelmann (1998). 

This approach contains an implicit assumption that the 
values of E are uncorrelated. However, we expect clusters of 
galaxies to have smooth, extended projected mass distri¬ 
butions, and wish to include this knowledge in our anal- 
ysis. The Intrinsic Correlat ion Function (ICF) formalism 
(Gull 1989; Robinson 1992) allows us to do exactly that; 
the physical distribution E is expressed as the convolution 
of a ‘hidden’ distribution with a broad kernel (the ICF). In 
this way the smoothing, which is always necessary at some 
stage when using such noisy data, is transferred from the 
data to the reconstruction process itself. The large number 
of free parameters in the model (the hidden pixel values) is 
effectively reduced by this smoothing to a number appro¬ 
priate to the quality of the data. In this way the properties 
of the noise can be carried throu gh in a calculable, if non¬ 
linear, fashion. Seitz et al. ( 1998 ) incorporate smoothing in 
their reconstruction scheme, but in an iterative way, making 
error estimation non-trivial. 

We now consider the form of the ICF; its parameterisa- 
tion introduces new degrees of freedom into the problem. A 
Bayesian analysis should allow the data to dictate the most 
suitable ICF, as follows. We expect the most important pa¬ 
rameter of the ICF to be its width. For a given functional 
form (e.g. a circularly-symmetric Gaussian), depending on 
a single width parameter w, equation (® reads 


Pr(E|Data, vS) = 


Pr(Data|E, w)Pr(E|w) 


(7) 


Pr(Data|w) 

The width parameter could be chosen to maximise its own 
posterior probability Pr(n;|Data); this distribution would 
certainly be a useful tool in assessing the relative merits 
of different ICF widths. Bayes’ theorem again gives 

Pr(Data|u))Pr(ui) 


Pr(u)|Data) = 


Pr(Data) 


( 8 ) 


Usually, the typical angular scale of the cluster is known to 
within at least an order of magnitude so that a uniform 
prior for w is appropriate; since Pr(Data) is a constant, 
Pr(Data|ui) may be used directly to infer vj. This value can 
be obtained during the reconstruction process by numeri¬ 
cally evaluating the normalising factor in equation 


Pr(Data|w) = / Pr(Data|E, w)Pr(E|w)dE. 


(9) 


This integral is known as the ‘evidence’; Sivia (1996) and 
MacKay (1992) give explanation of its application in data 
analysis. The evidence provides an objective discriminator 
between ICF widths lu, and, indeed, any other parameters 


we might choose to include in the reconstruction process. 
Comparison of the evidence calculated for different func¬ 
tional forms of the ICF allows the merits of different smooth¬ 


ing kernels to be evaluated (see section 3.4 below). Indeed, 
the regularisation parameter a (Paper I) is also determine d 
by maximising the evidence with respect to a (Gull 1989). 


Parameters such as a and w may be viewed as ‘nuisance’ 
parameters, and marginalised over. When the evidence is 
sharply peaked at some value, this marginalisati on is ap¬ 
proxi mately equivalent to using the peak value (MacKay 


1992|). 


Interpolation of a fine grid of predicted reduced shear 
values onto the galaxy positions retains the potential to ob¬ 
tain high angular resolution reconstructions; inclusion of an 
ICF effectively reduces the number of independent pixels to 
one more appropriate to the quality of the data at hand. An 
increase in the number of pixels in the working grids and the 
inclusion of an extra convolution calls for a faster numerical 
algorithm than that used in Papers I and II; we have utilised 
the commercially available software MEMSYS4, developed 
by MaxEnt Data Consultants Ltd. This code is widely used 
in the image processing community and has been proven to 
be highly stabl e; det ails of the numerical algorithms can be 
found in Gull (1990). 


2.3 Quantitative mass mapping 

A side-effect of smoothing data prior to an inversion, or of 
incorporating an intrinsic correlation function as described 
above, is the introduction of pronounced correlations be¬ 
tween the errors on each reconstruction pixel value. How¬ 
ever, calculation of the (Gaussian approximation to the) full 
covariance matrix of the errors on the E distribution (Paper 
I) successfully accounts for these correlations when calcu¬ 
lating integrals over the reconstruction. This is of particu¬ 
lar interest for the direct estimation of the total projected 
mass within an aperture from the reconstruction map. In¬ 
formation on the shape of the aperture is contained within 
a vector of weights Ci. The constant a is equal to zero if 
the pixel lies completely outside the aperture; otherwise 
Ci = Ai, where Ai is the area of the pixel (in square 
parsecs at the cluster) lying within the aperture. We then 
approximate the integral by a weighted sum of pixel values 
to produce the mass estimate (M ± ctm), where 


M = ^ CiEi 


and 


fTM 


-E 

'>'3 


CiCi Vii 


( 10 ) 


( 11 ) 


Here Vij is the covariance matrix of the reconstruction errors 


in each pix e l (Paper I). Whilst a par ameterised fit (Schneider 
et al. 2000; King & Schneider 2001) may be more physically 


motivated, this mass estimation procedure provides a quan¬ 
titative result that can be used to guide further analysis. The 
aperture can be any shape, and so can be tailored to match 
the investigation in hand. Also, calculation of a realistic er¬ 
ror on the mass estimates allows, within the Gaussian ap¬ 
proximation, estimation of the significance of features in the 
maps, without recourse to the peaks analysis or resampling 
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Figure 1. Projected mass distribution of two simulated clusters. Left: CLIO, a massive cluster at redshift 0.2. Right: CL08, a smaller 
cluster at redshift 0.78. The grey scale is S/Mqpc”^, the contours are spaced in steps of 500 h Mq pc“^) for CLIO and 300 h Mq 
pc“^) for CL08. Also marked are the apertures used for mass estimation from the reconstructed maps (dotted) and the mock observing 
region (dashed). 


methods advocated elsewhere ( 

ran Waerbeke 2000; 

Erben 

et al. 2i 

)00; 

Hoekstra et al. 2001 

). 


3 APPLICATION TO SIMULATED DATA 


circle of radius 0.25 h“^Mpc (~ 48 arcsec). Neither observ¬ 
ing region is particularly large, and the smaller clump in 
CLIO is marginally outside the observing field. These clus¬ 
ters were deliberately chosen to contain ‘interesting’ sub¬ 
structure, in order to illustrate the angular resolution of the 
reconstruction method. 


To demonstrate the method outlined in the previous section 
we now apply it to two simulated clusters, tak en fro m the 
sample generated by Eke, Navarro and Frenk ( 1998| ). The 
naming of the clusters is retained from that paper, in which 
a flat Universe dominated by a cosmological constant was as¬ 
sumed. The same cosmological parameters {Qm = 0.3, Q.a = 
0.7) were used to calculate the critical density and angular 
diameter distances needed in the leasing analysis below. The 
Hubble constant is taken as lOO/i km s“^ Mpc“^. 


3.1 The mass distributions 

The two projected mass distributions are shown in Fig¬ 
ure 1^. The first, CLIO, is at a redshift of 0.2, and has an 
X-ray emission-weighted temperature of 4.0 keV; the sec¬ 
ond, CL08, is at redshift 0.78 with temperature 2.1 keV. 
Neither cluster is extremely massive, having approximate 
virial masses of 6 and 1 x Mq respectively. Over¬ 

laid on these plots are the observing regions corresponding 
approximately to the field of view of the Keck Low Resolu- 
t ion I maging Spectrograph (for CLIO, see e.g. Clowe et al. 
(2000)) and an HST mosaic comprising two WFPC2 point¬ 


ings (for CL08). Also plotted are the apertures used to es¬ 
timate projected masses for the cluster components. For 
CLIO, these are circles of radius 0.2 h“^Mpc (~ 87 arcsec) 
and 0.1 h“^Mpc (~ 43 arcsec) centred on the large and small 
subclumps respectively, and are referred to as apertures 1 
and 2. Aperture 3 is the quadrilateral region between the 
subclumps. For CL08 a single aperture is defined, being a 


3.2 Simulated lensing data 

Mock galaxy ellipticity catalogues were generated using the 
CLIO and CL08 mass distributions as follows. The param¬ 
eters of the observations described in Clowe et al. (200C) 


and Hoekstra et al. (200C) were used to estimate the back¬ 


ground galaxy number densities obtainable in 2 hours ob¬ 
servation with either the Keck LRIS or HST. The median 
galaxy redshift was estimated f rom the Hubble Deep Fiel d 
photometric redshift catalogue (Fernandez-Soto et al. 1999), 


and then used to calculate an approximate value of Ecrit- 
Systematic effects due t o the unknown redshif t distribution 
of background galaxies (Fischer & Tyson 1997) are not con¬ 
sidered here. The projected mass distributions of Figure ^ 
were then converted to convergence and equation (|^ was ap¬ 
plied to produce maps of the components of reduced shear g 
on the same grid. These maps were then interpolated onto 
galaxy positions drawn at random from within the observ¬ 
ing field. The components of the intrinsic ellipticity of the 
sources, ef, were then drawn from a Gaussian distribution 
of width (Tintrinisic = 0.25 and t ransformed to their len sed 
counterparts using the relation (Seitz & Schneider 1997) 


( 12 ) 

where the asterisk denotes complex conjugation. Uncertain¬ 
ties introduced in the estimation of galaxy shapes were in¬ 
cluded by adding Gaussian noise with Gobs = 0.15. This 
is a reasonably pessimistic approach, with only the fainter 
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Figure 2. Top: Reconstructed mass distributions for the cluster CLIO. Left to right, the ICF width parameter w increases from 20 to 
70 arcsec. Bottom: KS93 direct inversions for comparison; the shear data were smoothed with a Gaussian of FWHM equal to w in each 
case. In all plots the contours show surface density in steps of 500 h Mq pc“^. The maximum on the density scale corresponds to a 
convergence of 0.61. 


galaxies detected having errors of this s ize associated with 


their ellipticities (Hoekstra et al. 2000; Bacon et al. 2001 


Bridle et al. 2002 ). The limiting magnitudes quoted should 
therefore be taken as those down to which image shape mea¬ 
surements could be made to this accuracy. The parameters 
of the mock observations are given in Table W. 


3.3 Results 

Reconstructions were performed for a range of Gaussian 
ICFs with varying FWHM w, the mass distributions were 
defined on 128 by 128 pixel grids. Each reconstruction, in¬ 
cluding aperture integrated mass estimation, required ap¬ 


proximately two minutes of CPU time on a single RIOOOO 
processor of a Silicon Graphics Origin 200. 


3.3.1 CLIO 

Reconstructions with ICF widths of 20 and 70 arcsec are 
shown in the top two panels of Figure The greyscale is 
plotted with the same limits as used for the true mass dis¬ 
tribution of Figure ^ For comparison, we also show results 
for which the mock data was first smoothed, using the same 
Gaussian ICFs as the smoothing ke rnel, and then inverted 
directly using the Kaiser & Squires ( 1993| ) algorithm to ob¬ 
tain a convergence map; results are plotted in the lower 
panel of the same figure. Here the contours are simply of 
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Table 1. Properties of the simulated 2 hour observations of section 


Cluster 

z 

Ilim 

Zs 

^crit 

(h A70PC-2) 

^peak 

-^obs 

(arcmin^) 

Us 

(arcmin”^) 

Ns 

CLIO 

0.2 

26.3 

1.0 

4634 

0.61 

36 

40 

1476 

CL08 

0.78 

26.5 

1.3 

4950 

0.38 

8.2 

100 

695 


From left to right: cluster name and redshift, limiting I magnitude, median source redshift, 
corresponding value of the critical density, cluster peak convergence, observing area, number 
density of background galaxies, and the total number of sources in the simulated catalogue. 



Figure 3. Pr(ui|Data) for the CLIO analysis. The logarithmic 
scale corresponds to the joined points, while the solid bars are on 
the linear scale. 


the surface density, obtained by scaling the convergence by 
the relevant value of Scrit; the greyscale is again the same as 
Figure]^ The grid on which the Fourier transforms were per¬ 
formed was padded with zeros outside the observing region 
to allow a more direct comparison with our method. At each 
smoothing scale the maps generated by the two methods 
contain recognisably similar structures; Figure ^ illustrates 
the way in which the smoothing has been moved from the 
data to the reconstruction. Features which differ from the 
true mass distribution are similar in both reconstructions, 
and are due to the noise realisation. 

Compared to the KS93 results, the maximum-entropy 
solutions are preferable as they are maps of inferred physical 
mass (so are necessarily positive), in which the noise on each 
data point has been translated to inferred uncertainties in 
the maps. At low values of the ICF width the high noise in 
the shear data acts to break up the lensing signal, leading 
to false apparent cluster substructure at small scales. The 
presence of many low level spurious features is also due to 
this over-fitting of the data. Note that the lensing signal is 
not diluted by moving to higher values of w, in contrast to 

the effe ct of data smoothing in the KS93 process. _ 

Th|] 


two maximum-entropy reconstructions in Fig¬ 
ure ^ are taken from the posterior probability distribu¬ 
tion Pr(w|Data); as described in Section this distribution 
is proportional to the numerically evaluated evidence and is 
shown in Figure H. The shape of this graph is typical. The 


reduction in probability at large ICF widths is because the 
data are poorly fitted by the overly smooth mass distribu¬ 
tion. At the other extreme, small ICF widths are strongly 
disfavoured as they effectively increase the number of free 
parameters in the fit; this is the ‘Occam’s razor’ factor which 
arises naturally from Bayesian model selection analysis (e.g., 
MacKay 1992) and also corresponds to the intuition that 
one shouldn’t over- or under-smooth data. The map corre¬ 
sponding to the maximum of Pr(u;[Data) represents a ‘map 
of believable features’, and occurs at in = 70 arcsec, shown 
in the right-hand panel of Figure ^ The contours of this 
reconstruction trace the two main mass condensations, and 
suggest the presence of a bridge of mass between them; no 
other significant features are visible. In the absence of fur¬ 
ther information about the cluster this map represents the 
most probable mass distribution, given the data and our 
choice of the functional form of the ICF. The eye is very 
sensitive to the high resolution detail of the main peak in 
Figure however, the evidence is sensitive to the entire 
mass distribution, which is smooth on larger angular scales. 

The projected mass within each of the three apertures 
shown in Figure ^ was calculated for the preferred 70 arcsec 
ICF width. This process was carried out on reconstructions 
from 100 galaxy catalMues with the same observational pa¬ 
rameters as in Table U but with different galaxy positions 
and ellipticities. The results are shown in the histograms 
of Figure W, with statistics from these measurements given 
in Table H The distributions are satisfyingly symmetrical 
about the true values; their widths are in very good agree¬ 
ment with the mean of the individual 1-sigma errors calcu¬ 
lated from equation (0). shown by the solid error bars. This 
demonstrates that the noise present in the galaxy shape data 
has been successfully carried through to the inferred quan¬ 
tities. 

We note that if there is no error due to the estimation 
of the galaxy shapes {i.e. CToSs = 0) then the error bar on 
the mean inferred mass is reduced by approximately ten 
per cent, which corresponds roughly to the change in the 
combined ellipticity error a. 

We can take this analysis one step further and calculate 
a mass profile around the larger sub-clump; this is shown in 
Figure p. The mass estimates can be seen to be accurate over 
a reasonable range of angular scales, with slight overestima¬ 
tion as the apertures extend further outside the observing 
region. 

Our analysis has at no point attempted to account for 
the ‘mass sheet degeneracy’ (Falco et al. 198f|; [Schneider & 


Seitz 1995). If the observing field is sufficiently large then 


the entropic prior acts to pin down the reconstruction at the 
edge of this region (Paper II), constraining any mass sheet 
transformation to be small. This control of the mass sheet 
degeneracy is the outcome of our choice of a low default 
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Figure 4. Mass estimates for CLIO. In each panel a histogram 
of mass estimates from the reconstruction maps from 100 real¬ 
isations of the background galaxy population are plotted. The 
dotted line marks the true value. The point shows the mean mass 
estimate; the error bar is the mean inferred error, not the stan¬ 
dard deviation of the histogram. The apertures used are shown 
in Figure m Top: circle of radius 0.2h~^ Mpc centred on the main 
cluster; middle: circle of radius Mpc centred on the sub¬ 

clump; bottom: quadrilateral region between the two main mass 
clumps. 


Table 2. 

Mass estimates for CLIO. 


Aperture Mtrue 

(4f>ioo 

{o'm)ioo 

1 

1.18 

(1.18 4 0.12) 

0.12 

2 

0.39 

(0.36 ± 0.07) 

0.06 

3 

0.28 

(0.27 4 0.06) 

0.07 


0.12 

(0.14 4 0.06) 

0.07 


All masses are in units of Mq. Left to right, the columns 

contain: aperture number (see text), true projected mass within 
aperture, the mean and standard deviation of masses estimated 
from 70 arcsec reconstructions from 100 noise realisations of the 
dataset, and the mean inferred error of these 100 mass estimates. 
The final row contains mass estimation results for the third aper¬ 
ture translated to a region containing apparently very little mass. 



Figure 5. Mass profile for the largest sub-clump of cluster CLIO. 
The points shows the mean estimated mass within that radius 
while the error bar is the inferred 1-sigma error (averaged over 100 
noise realisations). The shaded area shows the 1-sigma dispersion 
in the mass estimates over the 100 realisations. The dotted line 
marks the true profile. 0Ah~^ Mpc corresponds to 170 arcsec. 


model value to be used in the cross-entropy function (see 
Paper II). The value used in all reconstructions in this work 
was 100 h MqPc“^; a lower value was found to leave the re¬ 
construction maps and mass estimates unaffected. However, 
significantly increased model values gave masses overesti¬ 
mated by some tens of per cent, with mass sheets visibly 
present in the reconstructions. With the default model set 
suitably low, the residual effect of the mass sheet degeneracy 
on the mass estimates in any given reconstruction is small 
compared to the uncertainties due to the noise realisation; 
this can be seen by comparing the widths of the histograms 
to the ensemble-averaged error bars in Figure ^ 

Interestingly, the higher resolution maps give mass esti¬ 
mates which are systematically below the true values: such 
maps provide closer fits to the noise in the data, which 
breaks up the coherent lensing signal leading to an under¬ 
estimate of the total mass present. This also illustrates the 
way in which the value of w preferred by the evidence is de- 
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Figure 6. Top: Reconstructed mass distributions for the cluster CL08. Left to right, the ICF width parameter w increases from 20 to 
50 arcsec. Contours show surface density in steps of 300 h Mq pc“^).The maximum on the density scale corresponds to a convergence 
of 0.38. 


termined by the fit across the whole image, not just at the 
peaks of the mass distribution. 

Although our mass estimation procedure is simplistic, 
it does produce sensible and accurate results, even for mass 
condensations located on the edges of the observing region. 
The filament-like structure lying between the subclumps of 
CLIO, which was hinted at in the reconstruction maps of Fig¬ 
ure was successfully detected in that region; its mass was 
measured with an uncertainty of ~ 20 per cent. The same 
aperture was translated to the North-East by approximately 
200 arcsec, to a region containing just 0.12 x M©. 

When the mass estimation analysis was repeated, this value 
was also recovered to within the mean inferred error of 
~ 40 per cent. 


3.3.8 CL 08 

With a smaller dataset and lower peak convergence, this sim¬ 
ulation presents a more difficult problem. The most prob¬ 
able Gaussian ICF width is found to be 50 arcsec and the 
corresponding reconstruction is shown in Figure ^ The sub¬ 
structure in the cluster has been smoothed over, and the 
peak density is underestimated by a factor of two. Although 
a higher resolution reconstruction, which is also shown in 
Figure does suggest substructure, this particular noise re¬ 
alisation does not enable the fine detail of the true cluster 
mass distribution to be faithfully recovered. As an illustra¬ 
tion, 20 arcsec ICF reconstructions of four different noise re¬ 
alisations are shown in Figure]^. These reconstructions were 
deliberately selected (from a larger sample of 20) to high¬ 
light the extremes of good and bad fortune. The presence 
of density peaks in the reconstructions is clearly sensitive 
to the particular noise realisation. In contrast, the 50 arcsec 
reconstructions of the noise realisations of Figure are all 
very similar. In practice, there is only ever one available data 
set; for the set analysed in Figure there is very little infor- 






Figure 7. High resolution {w = 20 arcsec) reconstructions of 
CL08, from 4 different realisations of the background galaxy pop¬ 
ulation. 


mation about the two density peaks, but there is a lensing 
signal from the broader underlying mass distribution. The 
50 arcsec reconstruction is the most probable given the data: 
all of the data are used to infer the global noise properties, 
which are then reflected in the smooth reconstruction. Given 
such data we can say only that cluster CL08 is extended in 
the North-South direction, and that there is a slight sugges¬ 
tion of substructure. 

Despite the apparent poor quality reconstruction, the 
shear data are still sensitive to the total mass within an 
aperture: the mass estimation histogram of Figure ^ shows 
the total projected mass within 0.25 h~^Mpc of the cluster 
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Figure 8. Mass estimates from 50 arcsec reconstructions of CL08 
(see Figure ^ The aperture, shown in Figure ^ is a circle of radius 
0.25h~^ Mpc. 


centre to be well constrained by the data, with the large error 
bars (on average 0.2 x 10^"^M©) comparing reasonably 
well with the width of the histogram (0.3 x Mq). 

This discrepancy reflects the larger impact of the residual 
mass sheet degeneracy when smaller observing fields are 
used. There is a tendency towards overestimation of the 
total mass as the additive noise in the reconstruction be¬ 
comes important for this less massive cluster, but this effect 
is within the estimated error for the aperture considered 
(mean=0.92, truth=0.74 x Mq). The effect can be 

seen more clearly in Figure ^ the corresponding figure to 
Figure 0. Note that the edge of the ubseiviug legiun is at 
0.3/i~^ Mpe ; in such low signal, high noise, small obscrva - 
tions, extrapolation beyond the immediate vicinity of the 
cluster is clearly to be done with care. One issue is that of 
the prior on the IGF width - as the IGF width approaches 
the size of the observation the effect of the mass sheet de¬ 
generacy will increase. 


3.4 More advanced analyses 

The above analysis used a circularly symmetric Gaussian 
intrinsic correlation function, but there is no a priori rea¬ 
son why this smoothing kernel should give the best results. 
We also experimented with circular top-hat, exponential and 
softened isothermal (‘beta’) profiles, and all were found to 
give significantly lower values of the evidence for a given data 
set than the Gaussian IGF. The softened isothermal profile, 
aiming to optimise the fit to the cluster profile, was found to 
be worse at suppressing the noise in the outer regions of the 
cluster, while the presence of such broad wings introduced 
a large systematic overestimation of the mass integrals due 
to the mass sheet degeneracy. It is quite possible that the 
optimal IGF for the weak leasing reconstruction problem is 
not Gaussian, but our experience with these three alterna¬ 
tive functions leads us to expect that any gain in evidence 
would be marginal, and the reconstruction for a given IGF 
width would be changed very little. 

The argument given above against using the isothermal 
profile for an IGF suggests the use of more than one IGF at 



Figure 9. Mass profile for CL08. The points shows the mean esti¬ 
mated mass within that radius while the error bar is the inferred 
1-sigma error (averaged over 100 noise realisations). The shaded 
area shows the 1-sigma dispersion in the mass estimates over the 
100 realisations. The dotted line marks the true profile. 0.4h“^ 
Mpc corresponds to 80 arcsec. 


a time, allowing multiple resolution scales in the reconstruc¬ 
tion. The reconstruction then consists of a weighted sum 
of convolutions of hidden images with varying width IGFs. 
This ‘multi-scale maximum-entropy’ method has been ap¬ 


plied to a number of problems (Weir 1992; Bontekoe et al 


1991; McLachlan et al. 2002); it allows high spatial resolu¬ 


tion where the data warrant it. However, when multi-scale 
IGFs were applied to the weak lensing problems shown here, 
very little increase in evidence was found over the single IGF 
reconstructions, and the inferred mass distributions from the 
two approaches were indistinguishable. The introduction of 
another hidden image increases the size of the hypothesis 
space, introducing extra complexity to the reconstruction 
which is not justified by the quality of the data. 


4 APPLICATION TO REAL DATA 

We now apply our maximum-entropy method to real data. 
MS1054-03 is a high redshift (2 = 0.83) galaxy cluster; X- 
ray and dynamical measurements suggest that it has a high 
mass (Tx ~ lOkeV, Jeltema et al. 2001, a ~ 1150 km s“^, 
van Dokkum 1999). Two set s of w eak lensing data have 
been analysed. Glowe et al. ( poo^ produced a catalogue 
of 2723 background galaxies from a single Keck LRIS point¬ 
ing, with a number density of approximately 50-60 galax¬ 
ies per square arcminute. They performed a KS93 inver¬ 
sion using a smoothing kernel with a FWHM of approxi¬ 
mately 40 arcsec, and found the cluster to be extended in the 
East-West direction; with a smaller smoothing kernel they 
f ind t hree mass peaks to lower significance. Hoekstra et al. 
(H) measured the ellipticities of 2446 galaxy images from 
a deep HST mosaic consisting of 6 interlaced WFPG2 fields. 
They achieved a source density of around 80 arcmin^. They 
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Figure 10. Top: Reconstructed mass distributions for the cluster MS1054-03. Top: 40 arcsec (left) and 120 arcsec (right) IGF width 
reconstructions from the Keck data of Clowe et al. Bottom: 20 arcsec (left) and 80 arcsec (right) IGF width reconstructions from the HST 
data of Hoekstra et al. The left-hand panels contain maps with angular resolution corresponding to that of the maps already published; 
the right-hand panels show the maximum evidence reconstructions. Contours show surface density in steps of 300 h Mq pc“^. The maps 
are centred on the brightest cluster galaxy (BCG) position, marked with a cross. 


used th e maximum probability extension to the KS93 algo¬ 
rithm (Squires & Kaiser 1996) to produce a higher resolu¬ 
tion map (smoothed with a 20 arcsec kernel) showing three 
distinct mass peaks. 


The maximum entropy reconstructions from these data 
sets are shown in Figure M for two ICF widths, a low value 
of w equal to that used in the previously published analysis, 
and the width that maximises Pr(u:|Data). All reconstruc¬ 
tions were performed using the Gaussian ICF on a 128 x 128 
pixel grid. Hoekstra et al. calculate observational uncertain¬ 
ties on their estimated galaxy shape parameters and add 
them in quadrature to the intrinsic dispersion; we did the 


same. No corresponding weighting of galaxies is present in 
the Keck dataset. 


Both of the high resolution maps are qualitatively very 
similar to those given in the referenced papers (where the 
data were smoothed with Gaussian kernels of the same 
width as the chosen low w ICFs), but are now positivity- 
constrained. Innarticular, the three peaks found by Hoek¬ 
stra et al. (2000) are reproduced along with several other 
features present towards the edges of their map. The recon¬ 
struction fro m the Keck is also similar to the one found by 
Clowe et al. (2000). 

We now consider the probability distribution of the ICF 
width parameter w. This is shown for the two datasets in 
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Table 3. Reduced chi-squared values for each of the three candi¬ 
date sub-clumps in MS1054-03. 
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Figure 11. Pr(ui|Data) for the MS1054-03 analyses. Top: Keck 
data; bottom: HST data. 


Figure^. In both cases the evidence peaks at a significantly 
larger value of w than that used in the high resolution maps. 
These ‘maps of believable features’ clearly show the absence 
of significant structure away from the central cluster region, 
and suggest that the quality of the data is such that the 
substructure observed in the high resolution maps of Fig¬ 
ure ^ should be interpreted with caution. A measure of the 
goodness-of-fit of the two different resolution maps to the 
data was obtained by calculating the chi-squared statistic 
of equation with the summation now running over the 
galaxies contained within 40 arcsec radii apertures placed 
over the three candidate sub-clumps inferred from the HST 
data. Reduced chi-squared values were calculated by divid¬ 
ing by the number of galaxies in the aperture (« 280) minus 
the number of pixels in the aperture (« 100); these are given 
in table ^ 

It can be seen from this table that there is only a 
marginal improvement in the fit to the data by decreas- 


Sub-clump 

xio 

xio 

West 

1.18 

1.20 

Centre 

1.09 

1.10 

East 

1.20 

1.26 


The subscripts refer to the resolution scale w of the reconstruc¬ 
tion. Note that in each case the half-width of the reduced chi- 
squared distribution is approximately \/2 X 180/180 = 0.1. 

ing the ICF width; this improvement is heavily outweighed 
by the “Occam’s razor” factor present in the Bayesian evi¬ 
dence, which suggests that the extra complexity in the in¬ 
ferred mass distribution introduced by using an ICF width 
of less than 80 arcsec is not justified by these data alone. 

The two datasets are by no means independent noise 
realisations, since they are both observations of the same 
background galaxy population. However, the different ob¬ 
serving conditions have clearly introduced different galaxy 
shape measurement errors. The resulting difference in the 
details of the high resolution maps of Figure ^ apparently 
accords with the conclusions drawn from the probability dis¬ 
tribution of the resolution parameter w. However the differ¬ 
ences will also partly be due to the different weighting of the 
images in the two datasets. 

As stated before, in the absence of any other informa¬ 
tion about the cluster the maximum evidence map repre¬ 
sents the most probable mass distribution given the data; 
the additional information present in the cluster galaxy light 
and number density distributions ( Hoekstraet al. 2000 ), and 
the X-ray surface brightness maps^Doiiahue'eniL'TQ^ lel- 


tema et al. 2001) are clearly very important in the detailed 


interpretation of the weak lensing data, and ideally should 
be included in a joint analysis. 

The projected mass within 0.5h~^ Mpc (94 arcsec in the 
cosmology of Section ^) of the brightest cluster galaxy was 
calculated using the method outlined in Section m This was 
done for both maps given in the lower panels of Figure [lo| 
and t he results compared with that from Hoekstra et al. 
( [2000| ) in Table |^. Both the high and the low resolution 
mass estimates are consistent with the mass derived, from 
the tangential shear in circular apertures about the brightest 
cluster galaxy, by Hoekstra et al. It is reassuring to note 
that the statistical errors on these mass estimates agree very 
well with those calculated by aperture mass densitometry 
elsewhere. The issue of which resolution reconstruction, and 
so which mass estimate, to prefer may depend on prejudices 
about the likely level of substructure in this cluster. Given no 
other information about MS 1054-03 we would conclude that 
the quality of the data suggest the 70 arcsec resolution map 
as being the more probable, but that there is strong evidence 
for substructure in the core of this cluster; in any case the 
weak lensing data give a projected mass of M© with 

a statistical error of about 10 per cent. 


5 CONCLUSIONS 

We have developed a Bayesian analysis, based on the 
maximum-entropy method of Bridle et al. (1998, 2001), for 
inferring the distribution of mass in clusters of galaxies from 


© 0000 RAS, MNRAS 000, 000-000 





































12 P.J. Marshall et al. 


Table 4. Mass estimates for MS1054-03. 


Reconstruction: 
w = 20 
w = SO 

Hoekstra et al. 2000 


M/W^^h-^ M0 
(0.91 ± 0.09) 
(1.21 ± 0.15) 
(1.07 ± 0.12) 


All masses refer to the projected mass within Mpc of the 

BCG (located at the origin) and are in units of 10 ^®Mq. 
Only the HST data is used here, to allow direct comparison with 
Hoekstra et al. 


weak lensing shear data. We treat each background galaxy 
image as a noisy estimator of the reduced shear field of the 
cluster, retaining all the information about both signal and 
noise and so allowing the for high angular resolution. Use of 
an ‘intrinsic correlation function’ in the maximum-entropy 
formalism provides a way of incorporating our prior expecta¬ 
tion of clusters as smooth, extended objects, and effectively 
replaces the data smoothing required by direct reconstruc¬ 
tion methods. In contrast to the these methods, the lensing 
signal is not diluted by this process. Moreover, analysis of 
the posterior probability distribution of the ICF width w, 
obtained by numerically evaluating the Bayesian evidence, 
provides an objective way of discriminating between smooth¬ 
ing scales. The map at the peak of this probability distribu¬ 
tion was found not to contain any significant spurious peaks, 
and can be interpreted as the safest conclusion to draw from 
the data. The higher resolution maps, although representing 
an overfit to the data, do contain limited useful information 
particularly with respect to substructure in the cluster; the 
fact that their angular resolution is less favoured by the data 
quality gives a useful indication of the believability of these 
features. 

Simple mass estimates extracted directly from the mass 
maps preferred by the evidence were found to be unbiased 
and accurate to within the estimated errors over a fairly wide 
range of angular scales; these uncertainties agreed very well 
with the standard deviation in the mass estimates from 100 
different realisations of the background galaxy population. 
Noisier observations over a smaller field of view were found 
to give mass estimates with a slight bias towards overestima¬ 
tion, an effect understood in terms of the additivity of the 
reconstructed distribution and the mass sheet degeneracy. 
Inspection of the variety of structure in reconstructions from 
these different noise realisations justified the cautious inter¬ 
pretation of the high resolution maps, indicating that this 
analysis provides a useful way of understanding the noise 
properties of the data. 

We have applied our method to two galaxy shape 
datasets for the high redshift cluster MS1054-03, one derived 
from a ground based observation and the other from an HST 
mosaic. In both cases the features found in previously pub¬ 
lished maps, obtained by both direct and inverse methods, 
are reproduced, but with the added desirable features of be¬ 
ing positivity-constrained and quantitatively useful. Simple 
mass estimates extracted directly from the reconstruction 
agree well with values found by aperture mass densitome¬ 
try, as do the errors estimated by both methods. 

The principal function of parameter-free mass maps as 
produced by this method is to provide the equivalent of a 
mass telescope, allowing images to be generated to aid fur¬ 
ther, more quantitative analysis. Parameterised fitting to 


shear data with physical motiv a tion has been performe d 
elsewhere (Schneider et al. 200C; King & Schneider 2001); 
we believe that the information provided by our variable an¬ 
gular resolution analysis is a helpful guide to this process, 
whilst also providing reasonably accurate ‘mass photometry’ 
at the same time. The use of the Bayesian evidence in such 
fitting will be addressed in future papers. 
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